Script-description Pair Extraction from Text Documents of English as Second Language Podcast

نویسندگان

  • Hyungjong Noh
  • Minwoo Jeong
  • Sungjin Lee
  • Jonghoon Lee
  • Gary Geunbae Lee
چکیده

One of the best effective way to learn a language is having a conversation with a native speaker. However it is often very expensive way. A good alternative way is using Dialog-Based Computer Assisted Language Learning (DB-CALL) systems. The feedback quality in DB-CALL systems is very important. Therefore, to provide various expressions as feedback information, we propose a method which extracts script and their description sentence pairs from English as a Second Language (ESL) podcast web site. A linear CRFs classifier is used to find the corresponding description sentences and several features are selected according to the characteristics of the ESL text documents. The experimental results show that the performance of our system is acceptable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction de corpus parallèle pour la traduction automatique depuis et vers une langue peu dotée. (Extraction a parallel corpus for machine translation from and to under-resourced languages)

Nowadays, machine translation has reached good results when applied to several language pairs such as English – French, English – Chinese, English – Spanish, etc. Empirical translation, particularly statistical machine translation allows us to build quickly a translation system if adequate data is available because statistical machine translation is based on models trained from large parallel b...

متن کامل

The Impact of Lemmatization in Word Alignment

The focus of this thesis is on examining whether word alignment results can be improved in precision and recall through lemmatization, and extraction of lemma dictionaries from the resulting links. Lemmas are extracted from existing lexical resources in order to replace word forms in two parallel corpora documents, one featuring the language pair English-Swedish and the other the language pair ...

متن کامل

Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure

Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. To address this problem, we pr...

متن کامل

بازخوانی اسناد کتیبه‌ای غیرمنقول در میراث جهانی مجموعه بازار تاریخی تبریز

Immovable inscriptions are considered as one of the most important works and among the historical documents in cultural assets of our dear country, which were installed on selected parts of historical buildings and outstanding monuments and were always noticeable. The role of inscriptions as the basic and effective tools is important in terms of manifesting and implication of educational and ed...

متن کامل

DLOLIS-A: Description Logic based Text Ontology Learning

Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010